NearData

The objective of NEARDATA is to establish a robust data infrastructure that facilitates the movement of data between Object Storage and Data Analytics platforms throughout the Compute Continuum. The Novel XtremeDataHub platform serves as a data intermediary that captures and enhances data streams (S3 API, stream APIs) using high-performance near-data connectors (Cloud/Edge). Sano’s role is to create a pipeline for constructing a transcriptomics atlas for specific tissues and diseases, utilizing High-Performance Computing (HPC) and Cloud technologies.

Additionally, there is a need to develop a suite of tools for conducting Federated Learning experiments on extensive genomics data as part of the Federated Learning framework.

NearData: project website: neardata.eu

Duration: 01 January 2023 – 31 December 2025

Granting authority: EUROPEAN COMMISSION Directorate-General for Communications Networks, Content and Technology

Call: HORIZON-CL4-2022-DATA-01

Publications

Novel Approaches Toward Scalable Composable Workflows in Hyper-Heterogeneous Computing Environments

Authors: Jonathan Bader, Jim Belak, Matthew Bement, Matthew Berry, Robert Carson, Daniela Cassol, Stephen Chan, John Coleman, Kastan Day, Alejandro Duque, Kjiersten Fagnan, Jeff Froula, Shantenu Jha, Daniel S. Katz, Piotr Kica, Volodymyr Kindratenko, Edward Kirton, Ramani Kothadia, Daniel Laney, Fabian Lehmann, Ulf Leser, Sabina Lichołai, Maciej Malawski, Mario Melara, Elais Player, Matt Rolchigo, Setareh Sarrafan, Seung-Jin Sul, Abdullah Syed, Lauritz Thamsen, Mikhail Titov, Matteo Turilli, Silvina Caino-Lores, Anirban Mandal

Optimizing Star Aligner for High Throughput Computing in the Cloud

Authors: Piotr Kica, Sabina Lichołai, Michał Orzechowski, Maciej Malawski

In September 2024, Kobe, Japan hosted the IEEE International Conference on Cluster Computing — one of the most prominent global gatherings focused on high-performance and cluster computing. The conference showcased state-of-the-art developments and practical applications across disciplines. Among the attendees were researchers from Sano – during the Cluster2024 poster session, Piotr Kica presented his research entitled “Optimizing STAR Aligner for High Throughput Computing in the Cloud,” conducted in collaboration with Sabina Lichołai, Michał Orzechowski, and Maciej Malawski. His presentation detailed a cloud-based bioinformatics pipeline centered on the STAR aligner, emphasizing architectural design and performance improvements. The team achieved notable computational efficiencies through techniques such as “early stopping” and the use of updated STAR index files based on the latest genome builds.

Publication related to the poster is available at https://arxiv.org/abs/2409.05886.